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Hygiene surveys of pollutants exposure data can be analyzed by analysis of variance (ANOVA) model with a random worker effect, topically, work ere are 
classified into homogeneous exposure groups, so it is very common to obtain a zero or negative ANOVA estimate of the between-worker variance (<r|). 
Negative estimates are not sensible and also pose problems for estimating the probability (9) that in a job group, a randomly selected worker's mean exposure 
exceeds the occupational exposure standard. Therefore, it was suggested by Rappaport et al. to replace a non-positive estimate with an approximate one-sided 
60% upper confidence bound. This article develops an alternative estimator, based on the upper tolerance interval suggested by Wang and Iyer. We compared 
the performance of the two methods using real data and simulations with respect to estimating both the between -worker variance and the probability of 
overexposure in balanced designs. We found that the method of Rappaport et al. has three main disadvantages: (i)the estimated <r| remains negative for some 
data sets; (ii) the estimator performs poorly in estimating <r|and 6 with two repeated measures per worker and when true <ri is quite small, which are quite 
common situations when studying exposure; (iii) the estimator can be extremely sensitive to small changes in the data. Our alternative estimator offers a 
solution to these problems. Journal of Exposure Analysis and Environmental Epidemiology (2001 > 11,414-421. 
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Introduction 

Recently, analysis of variance (ANOVA) random effects 
models have been applied to data sets consisting of repeated 
measurements of pollutants within factories in order to 
identify determinants of exposure and estimate within- and 
between-worker variance components. The within-worker 
variance in these studies reflects day-to-day variations in 
the levels of exposure to pollutants, which often vary 
greatly. Between-worker variance, on the other hand, is 
often rather small due to the use of homogeneous exposure 
groups. Thus, the variance ratio A(=al/<Tw) may be quite 
small. As a result, when analyzing data using ANOVA 
random effects models, it is very common to obtain a zero or 
negative estimate of the between-worker variance. In many 
applications, it is common practice to report such negative 
values as zeros. 

The occurrence of negative or zero between-worker 
ANOVA variance estimates causes a number of problems. 
First, zero between-worker variance appears to be an 
unrealistic result since it implies that all workers have the 
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same mean exposure. This contradicts common industrial 
hygiene experience. Furthermore, in exposure assessment in 
epidemiological studies and for hazard control, the 
probability 9 of overexposure is often of more interest than 
the variance components themselves. This is the probability 
that in a job group, a randomly selected worker’s mean 
exposure exceeds the occupational exposure standard, 
where the worker’s mean exposure is relevant to the risk 
of chronic adverse health effects (Rappaport et al,, 1995). 
The probability of overexposure depends on both cr| and 
<7w- Common practice is to adopt a “plug in” approach in 
which cr| and aw are estimated and their estimates are 
inserted into the formula for 9 . This approach is impossible 
to employ when the estimate of al is zero or negative. 
Finally, the variance ratio should have implications for 
planning future sampling design. Small variance ratios 
imply that it may be advantageous to sample fewer 
individuals but at more time points. 

The estimation of the probability of overexposure (point 
estimator) becomes meaningless when a zero or negative 
between-worker variance estimate appears. Therefore, it 
was suggested by Rappaport et al. (1995) to replace a 
negative or zero estimate with an approximate one-sided 
60% upper bound, as derived from formulas of Williams 
and cited in Searle et al. (1992). This practice is based on 
empirical evidence that such a procedure has minimal 
impact on significance levels and statistical power. This 
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proposal does have some drawbacks. Many negative 
ANOVA estimates are not adjusted to positive values and 
the estimator is very sensitive to small changes in the data. 

This article develops an alternative — the bias-corrected 
variance component estimator — based on the upper 
tolerance interval suggested by Wang and Iyer (1994) to 
deal with the problem of negative variance component 
estimates. We compare the performance of the two methods 
using real data and simulations, focusing on the estimation 
of probabilities of overexposure (beyond standards) in 
balanced designs. 

ANOVA method 

We briefly review the ANOVA, or least squares (LS), 
method for estimating variance components in a balanced 
one-way random effects model. We denote: £= number of 
subjects in a group; n=number of repeated measurements 
obtained from each subject in the group: 

MSW = SS'W/(k(n - 1)); MSB = SSB/(k - 1); 

F = MSB/MSW 

The estimators of the between-subject (< 7 g) and within- 
subject (it w) variance components are: 

di = [MSB - MSW\jn ; d? w = MSW 

For more details, see Searle et al. (1992). 

An example from real data: lead exposure 

Nineteen workers at two Car Battery Producers in Israel 
were repeatedly measured to study their annual exposure to 
lead. They were randomly selected — 9 workers in the first 
factory and 10 in the second — to represent those exposed 
to the main processes (details can be found elsewhere; 
Peretz et al., 1997). Ten hygiene surveys, with intervals of 
3—7 weeks, were performed in each factory over the course 
of a year. Due to missing data (absence of workers, etc.), 
each worker had 6-10 repeated measures. We have taken 
the first six measures of each worker, and estimated the 
variance components a 3 and cr w factory. According to 
Israel’s regulations for factories with exposure to lead, it is 
mandatory to conduct two hygiene surveys each year. 

In order to highlight the sensitivity of the <r! estimator, 
we have created new data sets, each including just two 
repeated measures out of the six. In total, we had 15 sets of 
data with two repetitions for each factory, The exposure 
level was taken as a log transformation of the TLV 1 fraction 
(Hog(concentration/TLV)) (Peretz et al., 1997). The 


‘TLV“threshold limit value; a health-based concentration to which nearly 
all workers may be exposed without adverse effect 


TLV TWA 2 standard for occupational lead exposure 
according to Israel’s Regulations is 0.1 mg/m 3 . 

Table 1A shows summary measures of the estimators in 
each factory, in comparison to the original estimators 
(= “accurate”) based on six repetitions. It can be seen that a 
negative cr| estimate resulted from 40% of the series in the 
first factory (with true A=. 17) and from 20% of the series in 
the second factory (with true A~ .09). In addition, the 
ANOVA estimators for A were quite poor. This reinforces 
the importance of performing more than two repeated 
surveys per year. In practice, though, many surveys are 
limited to two measurements as mentioned above for lead 
exposure. So the example also highlights the need for 
statistical methods that can cope small samples. Table IB 
shows summary measures of the estimators if four repeated 
surveys were performed in each factory. One can see the 
improvement in the estimation when doubling the number 
of repeated measurements per subject. The MSE 
(= (meand-g-d-g) 2 +var<r|] is reduced by about 75% in 
the two factories. 


Estimating 6 in the presence of a negative ANOVA 
estimate of <rl 

Overexposure 

For hazard control, the probability 6 of overexposure is 
very important. We present here the basic equations for 
overexposure as derived by Rappaport et al. (1995). 
They followed the common assumption that the exposure 
Xy of worker i on day j follows a log normal distribution 
with: 

y>j = ln(x v ) = p y + at + e i; 

where is the mean of the overall logged exposure 
distribution in the group, cr,- is a random effect for the »'th 
worker and e tj is the within-worker random error. 

It was furthermore assumed that: a,-~A/(0,cr|), where 
q ( ’s are all independent; £y~V(0,cw), £,/s are all 
independent. <t 2 =ow + ‘Tb> <Jb = variance between workers; 
Cw=variance within workers. 

This model is applied to homogeneous work groups 
consisting of workers who perform similar tasks and 
therefore should have similar exposures. A worker is 
considered overexposed if his mean value p xi (conditional 
on a/) exceeds a standard*limit (S). The probability 9 that a 
randomly selected person from a work group is overexposed 
is thus: 

9 > S} = p\z > ltl(5) ~ ^ ~ = Z,-t j (1) 


2 TLV - TWA=threshold limit value, with respect to 8-h time-weighted 
average, that should not be exceeded during any part of the working day. 
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Table 1. Summary measures of ANOVA (LS) estimators, mean. 

SD (min,max) on semi-simulated data sets. 



Series 

Number of series c \ 


(A)n'=2 

Factory 1 

k b =9 

Accurate 

.09 

.17 (=.09/.53) 


Total 

15 

.10, .23 (-.23, .46) 

.35, .57 (-.33, 1,35) 


Positive 

9 

.26, .14 (.00, .46) 

.72, .42 (.01, 1.35) 


Negative 

6 

-.14, .08 (-23, -.04) 

-20, .11 (-.33, -.05) 

Factory 2 

* b =10 

Accurate 

.11 

.09 (=11/1.29) 


Total 

15 

.13, 57 (-.59, .65) 

.24, .36 (-.27, .88) 


Positive 

12 

.29, 20 ( .03, .65) 

.36, .30 (.03, .88) 


Negative 

3 

-.48, .10 (-59, -.42) 

-23, .04 (-.27, -20) 

(B) n"=4 

Factory l 

t b =9 

Accurate 

.09 

.17 (=.09/.53) 


Total 

15 

.09, .08 (-.03, .24) 

.19, .18 (-.06, .58) 


Positive 

12 

.12, .07 (.03, 24) 

.25, .16 (.06. .58) 


Negative 

3 

-.02, ,01(-.03, -.01) 

-.04, .02 (-.06, -.02) 

Factory 2 

* b =io 

Accurate 

.11 

.09 (= 11/1.29) 


Total 

15 

.12, .12 (-.09, .29). 

.11, .12 (-.05, .38) 


Positive 

12 

.16, .09 (.00, .29) 

.14, .10 (.00, J8) 


Negative 

3 

-.05, .03 (-.09, -.02) 

-.03, .02 (-.05, -.01) 


V=Number of repetitions. 
h £-Numbcr of woikere. 


The relationship between c| and 9 for different values of 
C=(i x /S (for <Tw=- 5) is presented in Figure 1. It can be 
seen that when .5<c<1.0, 9 has a maximum and then 
decreases, with little sensitivity to cb, Therefore, as 9 is 
calculated based on an estimate of c|, the estimate of 6 is 
quite stable for c| values, which are slightly larger than 
zero. There is a problem in the estimation of 9 when eg is 
near zero because 6 is sensitive to c(j in that region and 
because the ANOVA estimate of cl may be negative. 



Var-b 


Figure 1. Relationship of the between-workers variance (Var-b) and 
the probability of overexposure (8) for different values of c( = p T j 
Standard) for the group of workers (cw = -50). 

416 


Since nowadays there is an emphasis on making the 
exposure groups as homogeneous as possible, we may be 
faced with applications that have small values of A(=cl/ 
° w). 

Rappoport et al. Method 

Rappaport et al. (1995) recognized the problems of 
negative between-worker variance component estimates 
for estimating overexposure probabilities and for testing for 
compliance to standards. They proposed the following 
alternative estimator. 

Use the ANOVA estimate if c| is positive. Otherwise, 
substitute j_ a for 6\ where c^ ,_ a is an approximate 
100(1 — or)% upper confidence bound for cl, namely 
P(.°b < c| ]_ a ) f»l—a. The upper bound derived by 
Williams (1962), and cited in Searle et al. (1992), is: 

-2 _ (k - 1) (AffS - F l MSW) 

- —3 - 

"Xk-lJ. 

where 

/ > {^;_ 1 !)<F',} = l-a/2 
/’{xLi < = 1 - a/2, 

where and x*-i represent random variables 

distributed as F with (i—1) numerator and lc(n— I) 
denominator degrees of freedom and x 2 with (Jt-I) 

Journal of Exposure Analysis and Environmental Epidemiology (2001) 11(5) 


F 


PM3006519010 


Source: https://www.industrydocuments.ucsf.edu/docs/pngj0001 


The mated al on this page was copied from the collection of the National Library of Medicine by a third party and may be protected by U.S. Copyrigh 







Improved non-negative estimation of variance components for exposure assessment 


Peretz and Steinberg 


CD 


degrees of freedom, respectively. Rappaport et al. (1995) 
suggested using a 60% confidence bound. In a subsequent 
article, Lyles et al. (1997) used the same basic approach but 
with a 95%, rather than a 60%, approximate upper 
confidence bound for a negative d\ ANOVA estimate. 
Although the latter article dealt only with hypothesis 
testing, the 95% upper bound could also be used in 
estimating 9. 

Some Drawbacks to the Rappaport et al. Estimator 
We note here two problems with the between-worker 
variance component estimator proposed by Rappaport et al. 
First, the adjustment made to negative ANOVA estimates is 
often insufficient to produce a positive estimate. We 
illustrate this feature later in a simulation study. 

Second, the fact that Rappaport et al.’s estimator only 
corrects negative ANOVA estimates makes it very sensitive 
to small changes in the data. According to Rappaport et al., 
when A(o-e/(Tw)=aO. 1 - 0 . 2 , negative estimated values of <r| 
could be observed as much as 30-40% of the time when 
fc=10 and 2<n<4. This probability can be reduced by 
increasing the sample size; however, in reality, many 
occupational hygiene groups are of this order of magnitude, 
having two to four repeated measurements (Kromhout et al., 
1993). 

We illustrate the sensitivity of Rappaport et al.’s 
estimator with a simple example using simulated data with 
£=10, n=2, <rl = -l and t7w=l. First, a random set was 
generated and, gradually, eight slight changes were made to 
create eight further sets, each with the same worker averages 
but with increasingly larger within worker residuals. 

Table 2 presents the variance component estimates 
according to ANOVA and Rappaport et al.’s method with 
the 60% confidence bound. 

From step 7 on, the ANOVA estimate, d 3 ANOVA, was 
negative. The Rappaport et al. estimate for cr| makes a 
sudden jump from very small to very large values at step 7 


and thus is quite sensitive to small changes in the study data. 
A slight increase in the with in-workers mean square could 
change a positive ANOVA estimate to a negative one, thus 
sharply increasing Rappaport et al.’s estimate. This change 
could lead to a much larger estimate of 0. Since the error 
term in exposure measurements is already known to vary 
greatly over time (contributing to the within worker 
variability), measuring the same exposure group at different 
times can easily produce negative d^ANOVA estimates. 

Bias-Adjusted Variance Component Estimation (BAVCE) 
We suggest an alternative estimate to overcome some of the 
limitations of the estimator proposed by Rappaport et al. 
Our method, which we call BAVCE, is based on the upper 
tolerance interval suggested by Wang and Iyer (1994). 

It takes account of the fact that an upper confidence 
bound will typically be biased high and multiplies by a 
factor that attempts to adjust for this bias. The estimator is 
defined as follows: 

al = co 2 (MSB - F L MSW)/n 

where 

<=f t} = 9 for 7=(1 -7 )/n, 

where 7 is the confidence level (which we have taken to be 
,95),w 2 = ^/[l-(l-^')F L ] and 

0 = max(0,1—F L MSW/MSB). 

The BAVCE estimator, like the others (Rappaport et al., 
1995; Lyles etal., 1997), reduces the frequency of negative 
or zero estimates by subtracting less than the full value of 
MSW from MSB. However, their use of an upper 
confidence bound as an estimator almost guarantees an 
overestimate of ai. The factor w 2 in the BAVCE attempts to 
correct the upward bias. To see how the bias correction 


Table 2, Sensitivity of crL <rw estimators to small changes in values of a set of data. 


Set 

Percent inflation of 
residuals 

rr(y ANOVA 

#1 ANOVA 

tri-l" 

1 

random set 

.72 

.23 

.23 

2 

5 

.80 

.20 

.20 

3 

10 

,87 

.16 

.16 

4 

15 

.96 

.12 

.12 

5 

20 

1.04 

.07 

.07 

6 

25, 

1.13 

.03 

.03 

7 

30 

1.22 

-.02 

3.47 

8 

35 

1.32 

-.06 

3.45 

9 

40 

1.42 

-.11 

3.43 


n=2; fc=IO, original: c-m-.l, <rw=l. 

<f according to Rappaport et al.’s method, based on upper bound of 60%. 
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works, we present an approximation to the expected value 
of a\ by assuming that <j>= 1 — [crw/f/WB+cw)] * s k nown ‘ 
Then: 

i ”<4/PI+<^) _ ncr | 

1 “ F L (J l/{ n < T B + *1) no B + (1 - f l)°1 

and 


The bias correction is implemented by using a “plug-in” 
estimator of <j> in which the observed mean squares replace 
their expected values. 


Comparison of estimators on simulated data 


E {^b] = ^E{MSB - F l MSW) jn 

= w J [n<4 + {l 


Simulated Data 

Simulations were run to compare the different estimators 
of cry and ff. The estimators of erg were the ANOVA 
estimator, the estimator of Rappaport et al. with a 60% 
bound (method 1) and with a 95% bound (method 1A) 
and the BAVCE proposed here (method 2). The 
estimators of 9 were generated by plugging the estimators 


Table 3. Comparison of estimation methods based on simulations of 1000 sets for 10 workers with 2,3,4 repetitions. 


A. Results for negative LS estimators of vl. 


Repetitions (number of series) 

trl (origi«aK20) 




0(origina!=.34) 




LS method 

Method 1 

Method IA 

Method 2 

Method 1 

Method 2 

2 * 

(473) 

mean. SD 

-23, .19 

.06, .23 

S\, 37 

.16, .11 

32, .03 

31, .06 


min, max 

-1.13, .00 

.24, 6.14 

-.55. 1.83 

.00, .58 

.00, .34 

.00, .34 

J b 

(478) 

mean, SD 

-.13, .10 

.05, .13 

.32, 23 

.11, .07 

.31, .05 

.32. .05 


min, max 

-.61, .00 

-.47, .43 

-.33, 1.21 

.00, .41 

.04, .34 

.00, .34 

4‘ 

(413) 

mean, SD 

-.09,.07 

.04, .10 

.24, .16 

.09, ,05 

.30, .06 

31, .06 


min, max 

-.41, ,00 

-.30, 26 

-.19, .67 

.00, 24 

.00, 34 

.00, .34 

LS method=least squares method; method l=Rappaport et al/s methods based on upper bound of 60%; method 1 A—Rappaport et al.’s methods based on 
Upper bound of 95%; method 2=our method based on a modified upper bound. 

B. Results for positive LS estimators of cr| 






Repetitions (number of series) 


6b (origina!-.20) 



9 (original. 34) 





Method 1 (L$ method) 

Method 2 

Method 1 

Method 2 

2 

(327) 

mean, SD 


.27, .21 


.49, 21 

.31, .05 

.32, .02 


min, max 


.00, 1.14 


.11, 1.30 

.00, .34 

25, .34 

3 

(522) 

mean, SD 


.17, .14 


.34, .14 

.31, .06 

.33. .01 


min, max 


.00, .76 


.10. .93 

.00, .34 

28, 34 

4 

(587) 

mean, SD 


.14, .11 


.28, .11 

.30. .06 

.33, .0! 


min, max 


.00. .53 


.08, .63 

.00, .34 

.31, 34 


LS method—least squares method; method l=Rappaport et al.'s methods; method 2=our method based on a modified upper bound. 

*298/473 positive according to method 1; 437/473 positive according to method 1; 463/473 positive according to method 2, 

*318/478 positive according to method I; 445/478 positive according to method I; 462/478 positive according to method 2. 

°293/4l3 positive according to method t; 386/413 positive according to method 1; 408/413 positive according to method 2. 
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Table 4. Comparison of estimation methods based on simulations of 1000 sets for 10 workers with 2,3,4 repetitions. 


A. Results for negative LS estimators of o\ 
Repetitions (number of series) 


dj (original,05) 
LS method 


Method I 


Method ]A 


Method 2 


(original— 31) 
Method I 


Method 2 


mean, SD 
min, max 


-.25, .19 

-I.II, .00 


.04, .24 
-.87, .72 


.47, .37 
-.52, 1.82 


mean. SD 
min, max 


-.14, .10 
-.62, .00 


.03, .14 
- .48, .40 


.29, .22 
-.33,.1.15 


mean. SD 
min, max 


-.09, .07 
- T9, .00 


.03, .10 
-.28, 24 


.22, .15 
-.18, .63 


LS method—least squares method; method I - Rappaport et al.'s method based on tipper bound of 60%; method I A= Rappaport et al.’s method based on upper 
bound of 95%; method 2=our method based on a modified upper bound. 


B. Results for positive LS estimators of trs 
Repetitions (number of series) 


mean, SD 


mean, SD 
min, max 


mean, SD 
min, max 


(original" 05) _ 

Method 1 (LS method) 

24, .20 
.00, l .05 


6 (original^, 31) 

Method 2 

Method I 

Method 2 

.46, 20 

.31, .05 

.32, .02 

.10, 1.16 

.00, .34 

27, J4 

.32, .12 

.31, .05 

.33. .01 

.11. .93 

.00, .34 

.28, .34 

25, .10 

.30, .07 

.33, .01 

.08, .60 

.00, .34 

.31, .34 


LS method^least squares method; method l=Rappapon et al.’s method; method 2=our method based on a modified upper bound. 

*303/505 positive according to method f; 462/505 positive according to method I; 485/505 positive according to method 2. 

l> 336/538 positive according to method I; 485/538 positive according to method I; 521/538 positive according to method 2. 

c 326/510 positive according to method i; 469/510 positive according to method 1; 501/510 positive according to method 2. 


of crl along with the ANOVA estimator of <tw and the 
sample average into Eq. (1) in Section 4. The 
simulations covered three different practical settings 
defined by the number of repetitions («) and the number 
of subjects ( k ); 

(i) 1000 data sets for £=10, n-2 (20000 observations); 

(ii) 1000 data sets for fc=10, n~ 3 (30000 observations); 
and 

(iii) 1000 data sets for £=10, tt—4 (40000 observations). 

In addition, we examined several different values of a&. 

The within-subject variance ctw was held constant at! in all 
the simulations. Original values for <j> for rt=2,3,4 were 
computed from Eq. (1). When the least squares estimate of 
the between-workers variance component <r| was negative. 

Journal of Exposure Analysis and Environmental Epidemiology (2001) 11(5) 


method 1 modified it to a larger value. The method 2 
estimator increased all the ug estimates, not just the 
negative ones. 

Comparison of Estimators 

Tables 3 and 4 present the estimators based on the simulated 
data for n=2,3,4, when original a&= .2 (Table 3) or cr|= .05 
(Table 4), which are representative of the results that we 
found for all the values of 

Tables 3A and 4A relate to the estimators when negative 
ANOVA estimates were found. Tables 3B and 4B telate to 
the estimators when positive ANOVA estimates were found. 
As was found previously, the ANOVA estimator of eg was 
often negative for the cases we studied. In Table 3A (<7jij=.2, 
A=.2), we can see that more than 40% of the data sets for 
rr=2,3,4 resulted in a negative ANOVA estimate and in 
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The modified variance component estimator for negative 
values proposed by Rappaport et al. (1995) and Lyles et al. 
(1997) has three main disadvantages: 

1. It remains negative for some data sets. 

2. It performs poorly in estimating <r| and also 6 when 
n-2 and when original is quite small and 
0,5xStandard<p x <1.0xStandard, which are quite 
common situations when studying exposure. 

3. Discontinuous behavior: small changes in the data set 
can make the ANOVA estimator negative, resulting in 
the use of the modification, which may cause a large 
change in the conclusions of a study. 

In this paper, we have proposed an alternative variance 
component estimator, the B AVCE, to cope with the problem 
of negative and zero between-worker ANOVA estimates. 
Our modification seems to react better than the estimator of 
Rappaport et ai. as can be seen in the tables from our 
simulations and the simulated subsets of data. 

We think that further thought should be given to analysis 
of data from unbalanced designs, which are common in 
real-life exposure data sets due to absence of workers and 
changes in work practices. 

Here, exposure was measured in industry and agriculture. 
The same ideas can be applied to environmental exposure 
within the community. 
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